Effective pruning for XML structural match queries

نویسندگان

  • Yefei Xin
  • Zhen He
  • Jinli Cao
چکیده

Extensible Markup Language (XML) is becoming the de facto standard for exchanging information over the Internet, which results in the proliferation of XML documents. This has led to increased interest in this area by the research community. One of the main challenges is processing large collections of XML documents efficiently. Most current methods suffer from two drawbacks: an inability to complement each other to further enhance query processing performance without modifying the existing query processing engine; and an incapability of being customized for different structural and usage characteristics. This paper presents a new approach for structural query processing called Property-Driven Pruning Algorithm (PDPA), which offers the twin features of structural query processing independence and plug-and-play properties to overcome both drawbacks. PDPA consists of two phases: the offline and the online phase. During the offline phase, a list of pruning properties is added into the original XML documents. During the online phase, the input queries are modified with a list of carefully selected properties which are used during query processing to quickly prune non-matching candidate documents. We have proposed an exhaustive and a greedy heuristic algorithm. The experimental results based on both algorithms demonstrate that PDPA can improve XML query processing performance in a variety of situations by up to two fold.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

StreamTX: extracting tuples from streaming XML data

We study the problem of extracting flattened tuple data from streaming, hierarchical XML data. Tuple-extraction queries are essentially XML pattern queries with multiple extraction nodes. Their typical applications include mapping-based XML transformation and integrated (set-based) processing of XML and relational data. Holistic twig joins are known for the optimal matching of XML pattern queri...

متن کامل

Apply Uncertainty in Document-Oriented Database (MongoDB) Using F-XML

As moving to big data world where data is increasing in unstructured way with high velocity, there is a need of data-store to store this bundle amount of data. Traditionally, relational databases are used which are now not compatible to handle this large amount of data, so it is needed to move on to non-relational data-stores. In the current study, we have proposed an extension of the Mongo...

متن کامل

Apply Uncertainty in Document-Oriented Database (MongoDB) Using F-XML

As moving to big data world where data is increasing in unstructured way with high velocity, there is a need of data-store to store this bundle amount of data. Traditionally, relational databases are used which are now not compatible to handle this large amount of data, so it is needed to move on to non-relational data-stores. In the current study, we have proposed an extension of the Mongo...

متن کامل

Efficiency of TreeMatch Algorithm in XML Tree Pattern Matching

In Recent days exchange XML data more often in organizations and business sectors, so there is an increasing need for effective and efficient processing of queries on XML data. This paper presents a wide analysis to identify the efficiency of XML tree pattern matching algorithms. Previous years many methods have been proposed to match XML tree queries efficiently. In particularly TwigStack, Ord...

متن کامل

Early Profile Pruning on XML-aware Publish/Subscribe Systems

Publish-subscribe applications are an important class of contentbased dissemination systems where the message transmission is defined by the message content, rather than its destination IP address. With the increasing use of XML as the standard format on many Internet-based applications, XML aware pub-sub applications become necessary. In such systems, the messages (generated by publishers) are...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Data Knowl. Eng.

دوره 69  شماره 

صفحات  -

تاریخ انتشار 2010